11 research outputs found
A Collaborative Approach to Computational Reproducibility
Although a standard in natural science, reproducibility has been only
episodically applied in experimental computer science. Scientific papers often
present a large number of tables, plots and pictures that summarize the
obtained results, but then loosely describe the steps taken to derive them. Not
only can the methods and the implementation be complex, but also their
configuration may require setting many parameters and/or depend on particular
system configurations. While many researchers recognize the importance of
reproducibility, the challenge of making it happen often outweigh the benefits.
Fortunately, a plethora of reproducibility solutions have been recently
designed and implemented by the community. In particular, packaging tools
(e.g., ReproZip) and virtualization tools (e.g., Docker) are promising
solutions towards facilitating reproducibility for both authors and reviewers.
To address the incentive problem, we have implemented a new publication model
for the Reproducibility Section of Information Systems Journal. In this
section, authors submit a reproducibility paper that explains in detail the
computational assets from a previous published manuscript in Information
Systems
Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar
Automatic machine learning is an important problem in the forefront of
machine learning. The strongest AutoML systems are based on neural networks,
evolutionary algorithms, and Bayesian optimization. Recently AlphaD3M reached
state-of-the-art results with an order of magnitude speedup using reinforcement
learning with self-play. In this work we extend AlphaD3M by using a pipeline
grammar and a pre-trained model which generalizes from many different datasets
and similar tasks. Our results demonstrate improved performance compared with
our earlier work and existing methods on AutoML benchmark datasets for
classification and regression tasks. In the spirit of reproducible research we
make our data, models, and code publicly available.Comment: ICML Workshop on Automated Machine Learnin
AlphaD3M: An Open-Source AutoML Library for Multiple ML Tasks
peer reviewedWe present AlphaD3M, an open-source Python library that supports a wide range of machine learning tasks over different data types. We discuss the challenges involved in supporting multiple tasks and how AlphaD3M addresses them by combining deep reinforcement learning and meta-learning to construct pipelines over a large collection of primitives effectively. To better integrate the use of AutoML within the data science lifecycle, we have built an ecosystem of tools around AlphaD3M that support user-in-the-loop tasks, including selecting suitable pipelines and developing custom solutions for complex problems. We present use cases that demonstrate some of these features. We report the results of a detailed experimental evaluation showing that AlphaD3M is effective and derives highquality pipelines for a diverse set of problems with performance comparable or superior to state-of-the-art AutoML systems
AlphaD3M: Machine Learning Pipeline Synthesis
peer reviewedWe introduce AlphaD3M, an automatic machine learning (AutoML) system based on
meta reinforcement learning using sequence models with self play. AlphaD3M is
based on edit operations performed over machine learning pipeline primitives
providing explainability. We compare AlphaD3M with state-of-the-art AutoML
systems: Autosklearn, Autostacker, and TPOT, on OpenML datasets. AlphaD3M
achieves competitive performance while being an order of magnitude faster,
reducing computation time from hours to minutes, and is explainable by design
ReproZip: 1.0.8
Behavior changes:
No longer default to overwriting trace directories. ReproZip will ask what to do or exit with an error if one of --continue/--overwrite is not provided
Bugfixes:
Fix an issue identifying Debian packages when a file's in two packages
Fix Python error Mixing iteration and read methods would lose data
Fix reprounzip info showing some numbers as 0 instead of hiding them in non-verbose mode
Another fix to X server IP determination for Docker
Enhancements:
New GUI for reprounzip, allowing one to unpack without using the command-line
Add filters to remove some common files types from packed files (.pyc) or detected input files (.py, .so, ...)
Add JSON output format to reprounzip info
Allow using the Virtualbox display to reproduce X11-enabled experiments
Downloads:
reprozip (tarball)
reprounzip (wheel, tarball)
reprounzip-docker (wheel, tarball)
reprounzip-vagrant (wheel, tarball)
reprounzip-vistrails (wheel, tarball)
reprounzip-qt 0.1 (wheel, tarball)
Windows installer (Python 2.7, reprounzip, plugins and GUI)
Mac Installer (Python 2.7, reprounzip, plugins and GUI
uvcdat: UV-CDAT 2.6
The UV-CDAT team is pleased to announce the release of UV-CDAT version 2.6.
DOI
Change log is here
Many thanks to users, testers, and developers for helping UV-CDAT to reach this milestone. This is a bug fix release, we have fixed several major and minor bugs in version 2.6 and therefore we strongly recommend users upgrade their UV-CDAT installation.
From this release on UV-CDAT is distributed via conda
conda install -c uvcdat uvcdat
or
conda create -n uvcdat-2.6 -c uvcdat uvcdat
We also alert users to an Askbot website to help the UV-CDAT user community. This supports version 2.2 onward. See: http://uvcdat.askbot.co